11 research outputs found
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
Spatially proximate amino acids in a protein tend to coevolve. A protein's
three-dimensional (3D) structure hence leaves an echo of correlations in the
evolutionary record. Reverse engineering 3D structures from such correlations
is an open problem in structural biology, pursued with increasing vigor as more
and more protein sequences continue to fill the data banks. Within this task
lies a statistical inference problem, rooted in the following: correlation
between two sites in a protein sequence can arise from firsthand interaction
but can also be network-propagated via intermediate sites; observed correlation
is not enough to guarantee proximity. To separate direct from indirect
interactions is an instance of the general problem of inverse statistical
mechanics, where the task is to learn model parameters (fields, couplings) from
observables (magnetizations, correlations, samples) in large systems. In the
context of protein sequences, the approach has been referred to as
direct-coupling analysis. Here we show that the pseudolikelihood method,
applied to 21-state Potts models describing the statistical properties of
families of evolutionarily related proteins, significantly outperforms existing
approaches to the direct-coupling analysis, the latter being based on standard
mean-field techniques. This improved performance also relies on a modified
score for the coupling strength. The results are verified using known crystal
structures of specific sequence instances of various protein families. Code
implementing the new method can be found at http://plmdca.csc.kth.se/.Comment: 19 pages, 16 figures, published versio
Inverse Ising inference using all the data
We show that a method based on logistic regression, using all the data,
solves the inverse Ising problem far better than mean-field calculations
relying only on sample pairwise correlation functions, while still
computationally feasible for hundreds of nodes. The largest improvement in
reconstruction occurs for strong interactions. Using two examples, a diluted
Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that
interaction topologies can be recovered from few samples with good accuracy and
that the use of -regularization is beneficial in this process, pushing
inference abilities further into low-temperature regimes.Comment: 5 pages, 2 figures. Accepted versio
PoGOLite - A High Sensitivity Balloon-Borne Soft Gamma-ray Polarimeter
We describe a new balloon-borne instrument (PoGOLite) capable of detecting
10% polarisation from 200mCrab point-like sources between 25 and 80keV in one 6
hour flight. Polarisation measurements in the soft gamma-ray band are expected
to provide a powerful probe into high-energy emission mechanisms as well as the
distribution of magnetic fields, radiation fields and interstellar matter. At
present, only exploratory polarisation measurements have been carried out in
the soft gamma-ray band. Reduction of the large background produced by
cosmic-ray particles has been the biggest challenge. PoGOLite uses Compton
scattering and photo-absorption in an array of 217 well-type phoswich detector
cells made of plastic and BGO scintillators surrounded by a BGO anticoincidence
shield and a thick polyethylene neutron shield. The narrow FOV (1.25msr)
obtained with well-type phoswich detector technology and the use of thick
background shields enhance the detected S/N ratio. Event selections based on
recorded phototube waveforms and Compton kinematics reduce the background to
that expected for a 40-100mCrab source between 25 and 50keV. A 6 hour
observation on the Crab will differentiate between the Polar Cap/Slot Gap,
Outer Gap, and Caustic models with greater than 5 sigma; and also cleanly
identify the Compton reflection component in the Cygnus X-1 hard state. The
first flight is planned for 2010 and long-duration flights from Sweden to
Northern Canada are foreseen thereafter.Comment: 11 pages, 11 figures, 2 table
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach
Abstract Spatially proximate amino acid positions in a protein tend to co-evolve, so a protein's 3D-structure leaves an echo of correlations in the evolutionary record. Reverse engineering 3D-structures from such correlations is an open problem in structural biology, pursued with increasing vigor as new protein sequences continue to fill the data banks. Within this task lies a statistical stumbling block, rooted in the following: correlation between two amino acid positions can arise from firsthand interaction, but also be network-propagated via intermediate positions; observed correlation is not enough to guarantee proximity. The remedy, and the focus of this thesis, is to mathematically untangle the crisscross of correlations and extract direct interactions, which enables a clean depiction of co-evolution among the positions. Recently, analysts have used maximum-entropy modeling to recast this cause-and-effect puzzle as parameter learning in a Potts model (a kind of Markov random field). Unfortunately, a computationally expensive partition function puts this out of reach of straightforward maximum-likelihood estimation. Mean-field approximations have been used, but an arsenal of other approximate schemes exists. In this work, we re-implement an existing contact-detection procedure and replace its mean-field calculations with pseudo-likelihood maximization. We then feed both routines real protein data and highlight differences between their respective outputs. Our new program seems to offer a systematic boost in detection accuracy